{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CNKI_Selenium"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from lxml.html import fromstring\n",
    "import time\n",
    "from random import random\n",
    "import requests\n",
    "import base64"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 指定路径为下面的下载文件做准备\n",
    "from selenium import webdriver\n",
    "options = webdriver.ChromeOptions()\n",
    "out_path = r'C:\\Users\\46547\\Desktop\\web_ming_final'  # 是你想指定的路径\n",
    "prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': out_path}\n",
    "options.add_experimental_option('prefs', prefs)\n",
    "browser = webdriver.Chrome(executable_path=r'C:\\Users\\46547\\AppData\\Local\\Google\\Chrome\\Application\\chromedriver.exe', chrome_options=options)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-3-66d562601d91>:11: DeprecationWarning: use options instead of chrome_options\n",
      "  driver = webdriver.Chrome(chrome_options = opts)#desired_capabilities=caps\n"
     ]
    }
   ],
   "source": [
    "from selenium import webdriver\n",
    "from selenium.webdriver.common.desired_capabilities import DesiredCapabilities\n",
    "\n",
    "opts = webdriver.ChromeOptions()\n",
    "opts.add_argument('-no-sandbox')#解决DevToolsActivePort文件不存在的报错\n",
    "opts.add_argument('window-size=1920x3000')#指定浏览器分辨率\n",
    "opts.add_argument('--disable-gpu')#谷歌文档提到需要加上→这个属性来规避bug\n",
    "opts.add_argument('--hide-scrollbars')#隐藏滚动条，应对特殊页面\n",
    "\n",
    "\n",
    "driver = webdriver.Chrome(chrome_options = opts)#desired_capabilities=caps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 如果使用校内网，直接登录[中国知网](https://www.cnki.net/)\n",
    "* 如果使用校外网，请登录[外部访问系统](http://fsso.cnki.net/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 检查是否登录"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.get(\"https://www.cnki.net/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# driver.find_element_by_xpath('//*[@id=\"headerBox\"]/div[1]/div/div/div[4]').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 校内网_直接登录"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# #校园网ip登录\n",
    "# element = driver.find_element_by_id('Button2')\n",
    "# element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'中山大学南...'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "element = driver.find_element_by_id('Ecp_loginShowName1')\n",
    "element.get_attribute('innerHTML')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 检查窗口位置"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 出现多个窗口，检查窗口位置\n",
    "* 每一个窗口在driver中自动生成唯一的窗口id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击高级检索\n",
    "element = driver.find_elements_by_xpath('//div[@class=\"readvce\"]/a')[0]\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CDwindow-831891B401192DCD264D5199DFB72E58',\n",
       " 'CDwindow-4031B999C6810C2D4D3E02ADCC839FE6']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看窗口信息（现在打开了两个窗口）\n",
    "driver.window_handles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'CDwindow-831891B401192DCD264D5199DFB72E58'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "driver.current_window_handle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-11-6c6d5ce6602d>:1: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 直接搜索栏添加索引词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_index =  {\"theme\": \"智慧物联网\", \"author\": \"\",\"literature\":\"\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 清空主题input\n",
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[1]/div[2]/input').clear()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[1]/div[2]/input').send_keys(search_index['theme'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 清空作者input\n",
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[2]/div[2]/input').clear()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[2]/div[2]/input').send_keys(search_index['author'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 清空文献来源input\n",
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[3]/div[2]/input').clear()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//*[@id=\"gradetxt\"]/dd[3]/div[2]/input').send_keys(search_index['literature'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "element = driver.find_element_by_xpath('//input[@value=\"检索\"]')\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 点击期刊检索以及选择期刊"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击期刊\n",
    "driver.find_element_by_xpath('//ul[@class=\"doctype-menus keji\"]/li/a').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "#点击CSSI\n",
    "driver.find_element_by_xpath('//input[@key=\"CSI\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "#点击北大核心\n",
    "driver.find_element_by_xpath('//input[@key=\"HX\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('/html/body/div[2]/div/div[2]/div/div[1]/div[1]/div[2]/div[2]/input').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 填写query\n",
    "* 可以在高级检索直接检索（只要不精确查找）\n",
    "* 建议专业检索"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击期刊\n",
    "driver.find_element_by_xpath('//ul[@class=\"doctype-menus keji\"]/li/a').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击专业检索\n",
    "driver.find_element_by_name('majorSearch').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "AI_新媒体_query = '(TI=\"物联网\" and SU=\"智能\") OR (TI=\"AI\" and SU = \"科学\")'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "element = driver.find_element_by_xpath('//textarea')\n",
    "element.clear()\n",
    "element.send_keys(AI_新媒体_query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# driver.find_element_by_xpath('/html/body/div[4]/div/div[2]/div/div[1]/div[2]/dl/dd[1]/p').get_attribute('innerHTML')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "#点击CSSI\n",
    "driver.find_element_by_xpath('//input[@key=\"CSI\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "#点击北大核心\n",
    "driver.find_element_by_xpath('//input[@key=\"HX\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('/html/body/div[2]/div/div[2]/div/div[1]/div[1]/div[2]/div[2]/input').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 点击检索(点击页面显示50篇+全选)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 显示\n",
    "element = driver.find_element_by_xpath('//*[@id=\"perPageDiv\"]/div')\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 50 \n",
    "element = driver.find_element_by_xpath('//*[@id=\"perPageDiv\"]/ul/li[3]')\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 获取页面内容"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>篇名</th>\n",
       "      <th>作者</th>\n",
       "      <th>刊名</th>\n",
       "      <th>发表时间</th>\n",
       "      <th>被引</th>\n",
       "      <th>下载</th>\n",
       "      <th>操作</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>基于物联网技术的高校化学实验室安全监管系统的设计与实现</td>\n",
       "      <td>高文红; 孙欢; 韩晓敏; 陈剑波</td>\n",
       "      <td>实验技术与管理</td>\n",
       "      <td>2021-06-24 17:18</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>人工智能范式的革命与通用智能理论的创生  网络首发</td>\n",
       "      <td>钟义信</td>\n",
       "      <td>智能系统学报</td>\n",
       "      <td>2021-06-22 15:09</td>\n",
       "      <td>NaN</td>\n",
       "      <td>146.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>基于工业物联网的智能矿山基础信息采集关键技术与平台</td>\n",
       "      <td>贺耀宜; 刘丽静; 赵立厂; 周李兵</td>\n",
       "      <td>工矿自动化</td>\n",
       "      <td>2021-06-21 15:36</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>基于北斗系统的物联网技术与应用</td>\n",
       "      <td>谢军; 庄建楼; 康成斌</td>\n",
       "      <td>南京航空航天大学学报</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>77.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>基于知识图谱的区块链物联网领域研究分析</td>\n",
       "      <td>李嘉明; 赵阔; 屈挺; 刘晓翔</td>\n",
       "      <td>计算机科学</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>科学数据智能:人工智能在科学发现中的机遇与挑战  网络首发</td>\n",
       "      <td>孟小峰</td>\n",
       "      <td>中国科学基金</td>\n",
       "      <td>2021-06-11 18:01</td>\n",
       "      <td>NaN</td>\n",
       "      <td>465.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>运用科学计量学的人工智能安全技术评估</td>\n",
       "      <td>吴集; 梁江海; 刘书雷</td>\n",
       "      <td>国防科技大学学报</td>\n",
       "      <td>2021-06-07</td>\n",
       "      <td>NaN</td>\n",
       "      <td>102.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>人工智能与物联网在大气科学领域中的应用  网络首发</td>\n",
       "      <td>张敬林;薛珂;杨智鹏;张峰;张人禾</td>\n",
       "      <td>地球物理学进展</td>\n",
       "      <td>2021-05-31 14:24</td>\n",
       "      <td>NaN</td>\n",
       "      <td>716.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>基于物联网监控的烟叶精准种植管理系统设计与实践  网络首发</td>\n",
       "      <td>农英雄;陆瑛;陈智斌;黄聪;黄崇峻</td>\n",
       "      <td>中国烟草学报</td>\n",
       "      <td>2021-05-26 13:10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>119.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>基于母猪饲喂专家系统的群养母猪智能饲喂物联网系统设计</td>\n",
       "      <td>潘秦; 刘星桥</td>\n",
       "      <td>黑龙江畜牧兽医</td>\n",
       "      <td>2021-05-20</td>\n",
       "      <td>NaN</td>\n",
       "      <td>51.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>人工智能生物学:未来中医药现代化研究重要战略资源和竞争热点  网络首发</td>\n",
       "      <td>陆茵;韦忠红;邹伟;宋梦瑶</td>\n",
       "      <td>南京中医药大学学报</td>\n",
       "      <td>2021-05-19 10:15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>295.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12</td>\n",
       "      <td>司法人工智能融入司法改革的难题与路径</td>\n",
       "      <td>魏斌</td>\n",
       "      <td>现代法学</td>\n",
       "      <td>2021-05-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>350.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>文艺批评：人工智能及其挑战</td>\n",
       "      <td>刘建平</td>\n",
       "      <td>学术界</td>\n",
       "      <td>2021-05-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>86.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>人工智能与区块链赋能物联网：发展与展望</td>\n",
       "      <td>李萌;裴攀;孙恩昌;杨睿哲;司鹏搏</td>\n",
       "      <td>北京工业大学学报</td>\n",
       "      <td>2021-05-10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>281.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15</td>\n",
       "      <td>上海科技大学信息科学与技术学院周勇课题组在物联网空中计算和边缘计算研究中取得进展</td>\n",
       "      <td>NaN</td>\n",
       "      <td>信息网络安全</td>\n",
       "      <td>2021-05-10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16</td>\n",
       "      <td>基于场景生态的人工智能社会影响整合分析框架</td>\n",
       "      <td>苏竣; 魏钰明; 黄萃</td>\n",
       "      <td>科学学与科学技术管理</td>\n",
       "      <td>2021-05-10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>155.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17</td>\n",
       "      <td>人工智能+教育融合的困境与出路——复杂系统科学视角</td>\n",
       "      <td>徐莉; 梁震; 杨丽乐</td>\n",
       "      <td>中国电化教育</td>\n",
       "      <td>2021-05-08</td>\n",
       "      <td>NaN</td>\n",
       "      <td>569.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18</td>\n",
       "      <td>后疫情时代教育创新发展的新视域与中国卓越探索——出席“2020全球人工智能与教育大数据大会”的思考</td>\n",
       "      <td>陈丽; 任萍萍; 张文梅</td>\n",
       "      <td>中国电化教育</td>\n",
       "      <td>2021-05-08</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1131.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19</td>\n",
       "      <td>物联网技术在老年糖尿病病人健康管理中的应用现状</td>\n",
       "      <td>李鑫;孙坤;张文;张先庚;梁小利</td>\n",
       "      <td>护理研究</td>\n",
       "      <td>2021-05-08</td>\n",
       "      <td>NaN</td>\n",
       "      <td>483.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>20</td>\n",
       "      <td>面向电力物联网的5G通信认知无线电NOMA系统研究</td>\n",
       "      <td>佘蕊;张宁池;王艳茹;郭丹丹;马文洁</td>\n",
       "      <td>中国电力</td>\n",
       "      <td>2021-05-05</td>\n",
       "      <td>NaN</td>\n",
       "      <td>85.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>教育人工智能的哲学意蕴</td>\n",
       "      <td>尹璐; 安维复; 刘进</td>\n",
       "      <td>高教探索</td>\n",
       "      <td>2021-05-05</td>\n",
       "      <td>NaN</td>\n",
       "      <td>85.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>22</td>\n",
       "      <td>基于高阶Packet Tracer的温室智能物联网系统仿真研究  网络首发</td>\n",
       "      <td>王永红; 王诗瑶</td>\n",
       "      <td>河南农业科学</td>\n",
       "      <td>2021-04-30 13:34</td>\n",
       "      <td>NaN</td>\n",
       "      <td>145.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>23</td>\n",
       "      <td>信任能降低公众对人工智能技术的风险感知吗？  网络首发</td>\n",
       "      <td>朱依娜; 何光喜</td>\n",
       "      <td>科学学研究</td>\n",
       "      <td>2021-04-26 08:30</td>\n",
       "      <td>NaN</td>\n",
       "      <td>264.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>24</td>\n",
       "      <td>农业物联网技术现状与发展趋势</td>\n",
       "      <td>聂鹏程; 张慧; 耿洪良; 王铮; 何勇</td>\n",
       "      <td>浙江大学学报(农业与生命科学版)</td>\n",
       "      <td>2021-04-25 14:13</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1568.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>25</td>\n",
       "      <td>魔法与科学:人工智能的教育迷思及其祛魅</td>\n",
       "      <td>杨欣</td>\n",
       "      <td>教育学报</td>\n",
       "      <td>2021-04-25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>76.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>26</td>\n",
       "      <td>“证实规律”与“阐释意义”:人工智能时代教育研究范式的两种旨趣</td>\n",
       "      <td>王洪才; 田芬</td>\n",
       "      <td>西北师大学报(社会科学版)</td>\n",
       "      <td>2021-04-22 17:56</td>\n",
       "      <td>NaN</td>\n",
       "      <td>230.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>27</td>\n",
       "      <td>电力物联网终端安全防护研究综述  网络首发</td>\n",
       "      <td>苏盛; 汪干; 刘亮; 陈清清; 王坤</td>\n",
       "      <td>高电压技术</td>\n",
       "      <td>2021-04-20 09:54</td>\n",
       "      <td>NaN</td>\n",
       "      <td>274.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>28</td>\n",
       "      <td>“物联网+人工智能”：Web3.0时代的数字传媒发展初探</td>\n",
       "      <td>冉凌宇</td>\n",
       "      <td>出版广角</td>\n",
       "      <td>2021-04-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>231.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>29</td>\n",
       "      <td>一种面向物联网的智能反射面通信系统优化方法</td>\n",
       "      <td>李苗钰; 杜忠昊; 刘雨彤; 牛思莹</td>\n",
       "      <td>西北工业大学学报</td>\n",
       "      <td>2021-04-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>118.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>30</td>\n",
       "      <td>基于物联网云平台的TIG/MIG智能控制系统设计与应用  网络首发</td>\n",
       "      <td>刘万存; 袁亮文; 高永光</td>\n",
       "      <td>热加工工艺</td>\n",
       "      <td>2021-04-12 09:42</td>\n",
       "      <td>NaN</td>\n",
       "      <td>93.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>31</td>\n",
       "      <td>区块链赋能的高效物联网数据激励共享方案</td>\n",
       "      <td>蔡婷; 林晖; 陈武辉; 郑子彬; 余阳</td>\n",
       "      <td>软件学报</td>\n",
       "      <td>2021-04-08</td>\n",
       "      <td>NaN</td>\n",
       "      <td>283.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>32</td>\n",
       "      <td>从教学样式到学习范式：人工智能环境下学习的通用设计转化</td>\n",
       "      <td>杨绪辉</td>\n",
       "      <td>中国电化教育</td>\n",
       "      <td>2021-04-08</td>\n",
       "      <td>NaN</td>\n",
       "      <td>328.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>33</td>\n",
       "      <td>数学地球科学跨越发展的十年：大数据、人工智能算法正在改变地质学</td>\n",
       "      <td>周永章;左仁广;刘刚;袁峰;毛先成</td>\n",
       "      <td>矿物岩石地球化学通报</td>\n",
       "      <td>2021-04-07 11:49</td>\n",
       "      <td>NaN</td>\n",
       "      <td>465.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>34</td>\n",
       "      <td>教育领域技术原始创新的历史、逻辑与未来——兼论人工智能的教育意蕴</td>\n",
       "      <td>周子荷</td>\n",
       "      <td>开放教育研究</td>\n",
       "      <td>2021-04-01</td>\n",
       "      <td>NaN</td>\n",
       "      <td>315.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>35</td>\n",
       "      <td>物联网+区块链的饲料供应链金融信息服务平台</td>\n",
       "      <td>黄巍; 唐友</td>\n",
       "      <td>吉林农业大学学报</td>\n",
       "      <td>2021-03-31 10:56</td>\n",
       "      <td>1.0</td>\n",
       "      <td>423.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>36</td>\n",
       "      <td>基于LoRa物联网的智能节水灌溉系统</td>\n",
       "      <td>刘书伦; 彭高辉; 贾宝华</td>\n",
       "      <td>北方园艺</td>\n",
       "      <td>2021-03-30</td>\n",
       "      <td>NaN</td>\n",
       "      <td>254.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>37</td>\n",
       "      <td>人工智能可以通过操纵波来实现吗?</td>\n",
       "      <td>何清波; 姜添曦</td>\n",
       "      <td>上海交通大学学报</td>\n",
       "      <td>2021-03-28</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>38</td>\n",
       "      <td>区块链与物联网视角下的供应链金融模式创新研究</td>\n",
       "      <td>严振亚</td>\n",
       "      <td>新疆社会科学</td>\n",
       "      <td>2021-03-25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>106.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>39</td>\n",
       "      <td>从痴呆、AI到外星人：科幻电影中的弱智想象</td>\n",
       "      <td>黄鸣奋; 游长冬</td>\n",
       "      <td>江西师范大学学报(哲学社会科学版)</td>\n",
       "      <td>2021-03-25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>16.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>40</td>\n",
       "      <td>人工智能时代的情报学发展与创新——基于情报交流理论的视角</td>\n",
       "      <td>丁波涛</td>\n",
       "      <td>情报学报</td>\n",
       "      <td>2021-03-24</td>\n",
       "      <td>NaN</td>\n",
       "      <td>209.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>41</td>\n",
       "      <td>基于BIM、物联网技术的智能管控平台在火灾报警系统中的应用</td>\n",
       "      <td>路永明; 孟繁欣; 李嘉琪; 林子阳; 王宁波</td>\n",
       "      <td>水利水电技术(中英文)</td>\n",
       "      <td>2021-03-20</td>\n",
       "      <td>NaN</td>\n",
       "      <td>83.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>42</td>\n",
       "      <td>智能管道物联网网络层构建技术</td>\n",
       "      <td>刘桂志</td>\n",
       "      <td>油气储运</td>\n",
       "      <td>2021-03-18 16:35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>114.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>43</td>\n",
       "      <td>精准智能理论:面向复杂动态对象的人工智能</td>\n",
       "      <td>郑志明; 吕金虎; 韦卫; 唐绍婷</td>\n",
       "      <td>中国科学:信息科学</td>\n",
       "      <td>2021-03-18 08:50</td>\n",
       "      <td>NaN</td>\n",
       "      <td>188.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>44</td>\n",
       "      <td>基于区块链的物联网智能终端协作计算方案</td>\n",
       "      <td>查煜坤; 智慧; 房小彤</td>\n",
       "      <td>北京邮电大学学报</td>\n",
       "      <td>2021-03-16 11:27</td>\n",
       "      <td>NaN</td>\n",
       "      <td>176.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>45</td>\n",
       "      <td>人工智能时代算法风险的法律规制论纲</td>\n",
       "      <td>胡小伟</td>\n",
       "      <td>湖北大学学报(哲学社会科学版)</td>\n",
       "      <td>2021-03-16</td>\n",
       "      <td>1.0</td>\n",
       "      <td>839.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>46</td>\n",
       "      <td>基于多协议的温室智能物联网系统研究</td>\n",
       "      <td>王永红; 王诗瑶</td>\n",
       "      <td>北方园艺</td>\n",
       "      <td>2021-03-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>47</td>\n",
       "      <td>人工智能背景下农村企业财务管理水平提升的路径研究</td>\n",
       "      <td>曲海娟</td>\n",
       "      <td>农业经济</td>\n",
       "      <td>2021-03-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>92.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>48</td>\n",
       "      <td>论影视创作中人工智能预测技术的应用</td>\n",
       "      <td>诸廉</td>\n",
       "      <td>中州学刊</td>\n",
       "      <td>2021-03-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>74.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>49</td>\n",
       "      <td>基于物联网的智能传感器技术及其应用</td>\n",
       "      <td>孟峰; 张磊; 赵子未; 洪维</td>\n",
       "      <td>工矿自动化</td>\n",
       "      <td>2021-03-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>136.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>50</td>\n",
       "      <td>智能鱼菜共生装置的设计与试验研究——基于物联网远程控制</td>\n",
       "      <td>赵立军;李强;陈爽;李胜伶;徐鹏</td>\n",
       "      <td>农机化研究</td>\n",
       "      <td>2021-03-12</td>\n",
       "      <td>NaN</td>\n",
       "      <td>263.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Unnamed: 0                                                 篇名  \\\n",
       "0            1                        基于物联网技术的高校化学实验室安全监管系统的设计与实现   \n",
       "1            2                          人工智能范式的革命与通用智能理论的创生  网络首发   \n",
       "2            3                          基于工业物联网的智能矿山基础信息采集关键技术与平台   \n",
       "3            4                                    基于北斗系统的物联网技术与应用   \n",
       "4            5                                基于知识图谱的区块链物联网领域研究分析   \n",
       "5            6                      科学数据智能:人工智能在科学发现中的机遇与挑战  网络首发   \n",
       "6            7                                 运用科学计量学的人工智能安全技术评估   \n",
       "7            8                          人工智能与物联网在大气科学领域中的应用  网络首发   \n",
       "8            9                      基于物联网监控的烟叶精准种植管理系统设计与实践  网络首发   \n",
       "9           10                         基于母猪饲喂专家系统的群养母猪智能饲喂物联网系统设计   \n",
       "10          11                人工智能生物学:未来中医药现代化研究重要战略资源和竞争热点  网络首发   \n",
       "11          12                                 司法人工智能融入司法改革的难题与路径   \n",
       "12          13                                      文艺批评：人工智能及其挑战   \n",
       "13          14                                人工智能与区块链赋能物联网：发展与展望   \n",
       "14          15           上海科技大学信息科学与技术学院周勇课题组在物联网空中计算和边缘计算研究中取得进展   \n",
       "15          16                              基于场景生态的人工智能社会影响整合分析框架   \n",
       "16          17                          人工智能+教育融合的困境与出路——复杂系统科学视角   \n",
       "17          18  后疫情时代教育创新发展的新视域与中国卓越探索——出席“2020全球人工智能与教育大数据大会”的思考   \n",
       "18          19                            物联网技术在老年糖尿病病人健康管理中的应用现状   \n",
       "19          20                          面向电力物联网的5G通信认知无线电NOMA系统研究   \n",
       "20          21                                        教育人工智能的哲学意蕴   \n",
       "21          22              基于高阶Packet Tracer的温室智能物联网系统仿真研究  网络首发   \n",
       "22          23                        信任能降低公众对人工智能技术的风险感知吗？  网络首发   \n",
       "23          24                                     农业物联网技术现状与发展趋势   \n",
       "24          25                                魔法与科学:人工智能的教育迷思及其祛魅   \n",
       "25          26                    “证实规律”与“阐释意义”:人工智能时代教育研究范式的两种旨趣   \n",
       "26          27                              电力物联网终端安全防护研究综述  网络首发   \n",
       "27          28                       “物联网+人工智能”：Web3.0时代的数字传媒发展初探   \n",
       "28          29                              一种面向物联网的智能反射面通信系统优化方法   \n",
       "29          30                  基于物联网云平台的TIG/MIG智能控制系统设计与应用  网络首发   \n",
       "30          31                                区块链赋能的高效物联网数据激励共享方案   \n",
       "31          32                        从教学样式到学习范式：人工智能环境下学习的通用设计转化   \n",
       "32          33                    数学地球科学跨越发展的十年：大数据、人工智能算法正在改变地质学   \n",
       "33          34                   教育领域技术原始创新的历史、逻辑与未来——兼论人工智能的教育意蕴   \n",
       "34          35                              物联网+区块链的饲料供应链金融信息服务平台   \n",
       "35          36                                 基于LoRa物联网的智能节水灌溉系统   \n",
       "36          37                                   人工智能可以通过操纵波来实现吗?   \n",
       "37          38                             区块链与物联网视角下的供应链金融模式创新研究   \n",
       "38          39                              从痴呆、AI到外星人：科幻电影中的弱智想象   \n",
       "39          40                       人工智能时代的情报学发展与创新——基于情报交流理论的视角   \n",
       "40          41                      基于BIM、物联网技术的智能管控平台在火灾报警系统中的应用   \n",
       "41          42                                     智能管道物联网网络层构建技术   \n",
       "42          43                               精准智能理论:面向复杂动态对象的人工智能   \n",
       "43          44                                基于区块链的物联网智能终端协作计算方案   \n",
       "44          45                                  人工智能时代算法风险的法律规制论纲   \n",
       "45          46                                  基于多协议的温室智能物联网系统研究   \n",
       "46          47                           人工智能背景下农村企业财务管理水平提升的路径研究   \n",
       "47          48                                  论影视创作中人工智能预测技术的应用   \n",
       "48          49                                  基于物联网的智能传感器技术及其应用   \n",
       "49          50                        智能鱼菜共生装置的设计与试验研究——基于物联网远程控制   \n",
       "\n",
       "                         作者                 刊名              发表时间   被引      下载  \\\n",
       "0         高文红; 孙欢; 韩晓敏; 陈剑波            实验技术与管理  2021-06-24 17:18  NaN     NaN   \n",
       "1                       钟义信             智能系统学报  2021-06-22 15:09  NaN   146.0   \n",
       "2        贺耀宜; 刘丽静; 赵立厂; 周李兵              工矿自动化  2021-06-21 15:36  NaN     NaN   \n",
       "3              谢军; 庄建楼; 康成斌         南京航空航天大学学报        2021-06-15  NaN    77.0   \n",
       "4          李嘉明; 赵阔; 屈挺; 刘晓翔              计算机科学        2021-06-15  NaN     NaN   \n",
       "5                       孟小峰             中国科学基金  2021-06-11 18:01  NaN   465.0   \n",
       "6              吴集; 梁江海; 刘书雷           国防科技大学学报        2021-06-07  NaN   102.0   \n",
       "7         张敬林;薛珂;杨智鹏;张峰;张人禾            地球物理学进展  2021-05-31 14:24  NaN   716.0   \n",
       "8         农英雄;陆瑛;陈智斌;黄聪;黄崇峻             中国烟草学报  2021-05-26 13:10  NaN   119.0   \n",
       "9                   潘秦; 刘星桥            黑龙江畜牧兽医        2021-05-20  NaN    51.0   \n",
       "10            陆茵;韦忠红;邹伟;宋梦瑶          南京中医药大学学报  2021-05-19 10:15  NaN   295.0   \n",
       "11                       魏斌               现代法学        2021-05-15  NaN   350.0   \n",
       "12                      刘建平                学术界        2021-05-15  NaN    86.0   \n",
       "13        李萌;裴攀;孙恩昌;杨睿哲;司鹏搏           北京工业大学学报        2021-05-10  NaN   281.0   \n",
       "14                      NaN             信息网络安全        2021-05-10  NaN    17.0   \n",
       "15              苏竣; 魏钰明; 黄萃         科学学与科学技术管理        2021-05-10  NaN   155.0   \n",
       "16              徐莉; 梁震; 杨丽乐             中国电化教育        2021-05-08  NaN   569.0   \n",
       "17             陈丽; 任萍萍; 张文梅             中国电化教育        2021-05-08  NaN  1131.0   \n",
       "18         李鑫;孙坤;张文;张先庚;梁小利               护理研究        2021-05-08  NaN   483.0   \n",
       "19       佘蕊;张宁池;王艳茹;郭丹丹;马文洁               中国电力        2021-05-05  NaN    85.0   \n",
       "20              尹璐; 安维复; 刘进               高教探索        2021-05-05  NaN    85.0   \n",
       "21                 王永红; 王诗瑶             河南农业科学  2021-04-30 13:34  NaN   145.0   \n",
       "22                 朱依娜; 何光喜              科学学研究  2021-04-26 08:30  NaN   264.0   \n",
       "23     聂鹏程; 张慧; 耿洪良; 王铮; 何勇   浙江大学学报(农业与生命科学版)  2021-04-25 14:13  NaN  1568.0   \n",
       "24                       杨欣               教育学报        2021-04-25  NaN    76.0   \n",
       "25                  王洪才; 田芬      西北师大学报(社会科学版)  2021-04-22 17:56  NaN   230.0   \n",
       "26      苏盛; 汪干; 刘亮; 陈清清; 王坤              高电压技术  2021-04-20 09:54  NaN   274.0   \n",
       "27                      冉凌宇               出版广角        2021-04-15  NaN   231.0   \n",
       "28       李苗钰; 杜忠昊; 刘雨彤; 牛思莹           西北工业大学学报        2021-04-15  NaN   118.0   \n",
       "29            刘万存; 袁亮文; 高永光              热加工工艺  2021-04-12 09:42  NaN    93.0   \n",
       "30     蔡婷; 林晖; 陈武辉; 郑子彬; 余阳               软件学报        2021-04-08  NaN   283.0   \n",
       "31                      杨绪辉             中国电化教育        2021-04-08  NaN   328.0   \n",
       "32        周永章;左仁广;刘刚;袁峰;毛先成         矿物岩石地球化学通报  2021-04-07 11:49  NaN   465.0   \n",
       "33                      周子荷             开放教育研究        2021-04-01  NaN   315.0   \n",
       "34                   黄巍; 唐友           吉林农业大学学报  2021-03-31 10:56  1.0   423.0   \n",
       "35            刘书伦; 彭高辉; 贾宝华               北方园艺        2021-03-30  NaN   254.0   \n",
       "36                 何清波; 姜添曦           上海交通大学学报        2021-03-28  NaN    23.0   \n",
       "37                      严振亚             新疆社会科学        2021-03-25  NaN   106.0   \n",
       "38                 黄鸣奋; 游长冬  江西师范大学学报(哲学社会科学版)        2021-03-25  NaN    16.0   \n",
       "39                      丁波涛               情报学报        2021-03-24  NaN   209.0   \n",
       "40  路永明; 孟繁欣; 李嘉琪; 林子阳; 王宁波        水利水电技术(中英文)        2021-03-20  NaN    83.0   \n",
       "41                      刘桂志               油气储运  2021-03-18 16:35  NaN   114.0   \n",
       "42        郑志明; 吕金虎; 韦卫; 唐绍婷          中国科学:信息科学  2021-03-18 08:50  NaN   188.0   \n",
       "43             查煜坤; 智慧; 房小彤           北京邮电大学学报  2021-03-16 11:27  NaN   176.0   \n",
       "44                      胡小伟    湖北大学学报(哲学社会科学版)        2021-03-16  1.0   839.0   \n",
       "45                 王永红; 王诗瑶               北方园艺        2021-03-15  NaN   100.0   \n",
       "46                      曲海娟               农业经济        2021-03-15  NaN    92.0   \n",
       "47                       诸廉               中州学刊        2021-03-15  NaN    74.0   \n",
       "48          孟峰; 张磊; 赵子未; 洪维              工矿自动化        2021-03-15  NaN   136.0   \n",
       "49         赵立军;李强;陈爽;李胜伶;徐鹏              农机化研究        2021-03-12  NaN   263.0   \n",
       "\n",
       "    操作  \n",
       "0   下载  \n",
       "1   下载  \n",
       "2   下载  \n",
       "3   下载  \n",
       "4   下载  \n",
       "5   下载  \n",
       "6   下载  \n",
       "7   下载  \n",
       "8   下载  \n",
       "9   下载  \n",
       "10  下载  \n",
       "11  下载  \n",
       "12  下载  \n",
       "13  下载  \n",
       "14  下载  \n",
       "15  下载  \n",
       "16  下载  \n",
       "17  下载  \n",
       "18  下载  \n",
       "19  下载  \n",
       "20  下载  \n",
       "21  下载  \n",
       "22  下载  \n",
       "23  下载  \n",
       "24  下载  \n",
       "25  下载  \n",
       "26  下载  \n",
       "27  下载  \n",
       "28  下载  \n",
       "29  下载  \n",
       "30  下载  \n",
       "31  下载  \n",
       "32  下载  \n",
       "33  下载  \n",
       "34  下载  \n",
       "35  下载  \n",
       "36  下载  \n",
       "37  下载  \n",
       "38  下载  \n",
       "39  下载  \n",
       "40  下载  \n",
       "41  下载  \n",
       "42  下载  \n",
       "43  下载  \n",
       "44  下载  \n",
       "45  下载  \n",
       "46  下载  \n",
       "47  下载  \n",
       "48  下载  \n",
       "49  下载  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "element = driver.find_element_by_id('gridTable')\n",
    "page_html = element.get_attribute('innerHTML')\n",
    "data = pd.read_html(page_html)[0]\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 翻页"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'下一页'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 翻页\n",
    "element = driver.find_element_by_id('PageNext')\n",
    "element.get_attribute('innerHTML')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1/28'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 跳转上限\n",
    "element = driver.find_element_by_xpath('//span[@class=\"countPageMark\"]')\n",
    "page_str = element.get_attribute('innerHTML')\n",
    "page_str "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['1', '28']"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "page_int = page_str.split('/')\n",
    "page_int"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]\n"
     ]
    }
   ],
   "source": [
    "pages = list(range(1,int(page_int[1])+1))\n",
    "print(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 取27页\n",
    "pages = list(range(1,28))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "表格_html = dict()\n",
    "main_content =\"\"\n",
    "element = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "def process_pages (pages):\n",
    "    for p in pages:\n",
    "        print (p,end='\\t')\n",
    "        跳转 = driver.find_element_by_id('PageNext')\n",
    "        跳转.click()\n",
    "        time.sleep(8+1*random())\n",
    "        # 获取含有页面主要数据的表格\n",
    "        element = driver.find_element_by_id('gridTable')\n",
    "        main_content = element.get_attribute('innerHTML')\n",
    "        表格_html[p] = main_content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17\t18\t19\t20\t21\t22\t23\t24\t25\t26\t27\t"
     ]
    }
   ],
   "source": [
    "process_pages(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>html_snippets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>\\n&lt;div class=\"toolbar\"&gt;&lt;div id=\"countPageDiv\" ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        html_snippets\n",
       "1   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "2   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "3   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "4   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "5   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "6   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "7   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "8   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "9   \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "10  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "11  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "12  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "13  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "14  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "15  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "16  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "17  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "18  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "19  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "20  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "21  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "22  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "23  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "24  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "25  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "26  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ...\n",
       "27  \\n<div class=\"toolbar\"><div id=\"countPageDiv\" ..."
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([表格_html]).T\n",
    "df.columns = [\"html_snippets\"]\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "网站 = \"中国知网\"\n",
    "fn = { \"output\" : { \"htm_snippets\": \"data_raw_src/知网_htm_snippets_{网站}.tsv\"}\n",
    "     }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 保存页面内容的csv文件\n",
    "filename = fn [\"output\"] [\"htm_snippets\"] \n",
    "df.to_csv(filename.format(网站=网站), sep=\"\\t\", encoding=\"utf8\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "l_df = []\n",
    "for p in pages:\n",
    "    表格 = pd.read_html(表格_html[p])[0]\n",
    "    l_df.append(表格)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>篇名</th>\n",
       "      <th>作者</th>\n",
       "      <th>刊名</th>\n",
       "      <th>发表时间</th>\n",
       "      <th>被引</th>\n",
       "      <th>下载</th>\n",
       "      <th>操作</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>基于物联网技术的高校化学实验室安全监管系统的设计与实现</td>\n",
       "      <td>高文红; 孙欢; 韩晓敏; 陈剑波</td>\n",
       "      <td>实验技术与管理</td>\n",
       "      <td>2021-06-24 17:18</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>人工智能范式的革命与通用智能理论的创生  网络首发</td>\n",
       "      <td>钟义信</td>\n",
       "      <td>智能系统学报</td>\n",
       "      <td>2021-06-22 15:09</td>\n",
       "      <td>NaN</td>\n",
       "      <td>146.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>基于工业物联网的智能矿山基础信息采集关键技术与平台</td>\n",
       "      <td>贺耀宜; 刘丽静; 赵立厂; 周李兵</td>\n",
       "      <td>工矿自动化</td>\n",
       "      <td>2021-06-21 15:36</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>基于北斗系统的物联网技术与应用</td>\n",
       "      <td>谢军; 庄建楼; 康成斌</td>\n",
       "      <td>南京航空航天大学学报</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>77.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>基于知识图谱的区块链物联网领域研究分析</td>\n",
       "      <td>李嘉明; 赵阔; 屈挺; 刘晓翔</td>\n",
       "      <td>计算机科学</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1299</th>\n",
       "      <td>1350</td>\n",
       "      <td>人工智能与认知研究的新进展</td>\n",
       "      <td>胡懋仁</td>\n",
       "      <td>自然辩证法研究</td>\n",
       "      <td>1994-05-18</td>\n",
       "      <td>4.0</td>\n",
       "      <td>560.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1300</th>\n",
       "      <td>1351</td>\n",
       "      <td>从思维科学看人工智能的研究</td>\n",
       "      <td>刘泉宝; 刘永清</td>\n",
       "      <td>计算机科学</td>\n",
       "      <td>1994-05-15</td>\n",
       "      <td>6.0</td>\n",
       "      <td>163.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1301</th>\n",
       "      <td>1352</td>\n",
       "      <td>人工智能的哲学思考</td>\n",
       "      <td>陶承德</td>\n",
       "      <td>学习论坛</td>\n",
       "      <td>1994-02-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>410.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1302</th>\n",
       "      <td>1353</td>\n",
       "      <td>人工智能研究中心</td>\n",
       "      <td>张孟杰</td>\n",
       "      <td>河北农业大学学报</td>\n",
       "      <td>1993-10-01</td>\n",
       "      <td>NaN</td>\n",
       "      <td>41.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1303</th>\n",
       "      <td>1354</td>\n",
       "      <td>归纳逻辑与人工智能相结合的研究问题</td>\n",
       "      <td>王雨田</td>\n",
       "      <td>哲学研究</td>\n",
       "      <td>1992-04-30</td>\n",
       "      <td>8.0</td>\n",
       "      <td>289.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1354 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Unnamed: 0                           篇名                  作者          刊名  \\\n",
       "0              1  基于物联网技术的高校化学实验室安全监管系统的设计与实现   高文红; 孙欢; 韩晓敏; 陈剑波     实验技术与管理   \n",
       "1              2    人工智能范式的革命与通用智能理论的创生  网络首发                 钟义信      智能系统学报   \n",
       "2              3    基于工业物联网的智能矿山基础信息采集关键技术与平台  贺耀宜; 刘丽静; 赵立厂; 周李兵       工矿自动化   \n",
       "3              4              基于北斗系统的物联网技术与应用        谢军; 庄建楼; 康成斌  南京航空航天大学学报   \n",
       "4              5          基于知识图谱的区块链物联网领域研究分析    李嘉明; 赵阔; 屈挺; 刘晓翔       计算机科学   \n",
       "...          ...                          ...                 ...         ...   \n",
       "1299        1350                人工智能与认知研究的新进展                 胡懋仁     自然辩证法研究   \n",
       "1300        1351                从思维科学看人工智能的研究            刘泉宝; 刘永清       计算机科学   \n",
       "1301        1352                    人工智能的哲学思考                 陶承德        学习论坛   \n",
       "1302        1353                     人工智能研究中心                 张孟杰    河北农业大学学报   \n",
       "1303        1354            归纳逻辑与人工智能相结合的研究问题                 王雨田        哲学研究   \n",
       "\n",
       "                  发表时间   被引     下载  操作  \n",
       "0     2021-06-24 17:18  NaN    NaN  下载  \n",
       "1     2021-06-22 15:09  NaN  146.0  下载  \n",
       "2     2021-06-21 15:36  NaN    NaN  下载  \n",
       "3           2021-06-15  NaN   77.0  下载  \n",
       "4           2021-06-15  NaN    NaN  下载  \n",
       "...                ...  ...    ...  ..  \n",
       "1299        1994-05-18  4.0  560.0  下载  \n",
       "1300        1994-05-15  6.0  163.0  下载  \n",
       "1301        1994-02-15  NaN  410.0  下载  \n",
       "1302        1993-10-01  NaN   41.0  下载  \n",
       "1303        1992-04-30  8.0  289.0  下载  \n",
       "\n",
       "[1354 rows x 8 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_url_out = pd.concat(l_df).reset_index(drop=True)\n",
    "df_总表 = data.append(df_url_out)\n",
    "df_总表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>篇名</th>\n",
       "      <th>作者</th>\n",
       "      <th>刊名</th>\n",
       "      <th>发表时间</th>\n",
       "      <th>被引</th>\n",
       "      <th>下载</th>\n",
       "      <th>操作</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>基于物联网技术的高校化学实验室安全监管系统的设计与实现</td>\n",
       "      <td>高文红; 孙欢; 韩晓敏; 陈剑波</td>\n",
       "      <td>实验技术与管理</td>\n",
       "      <td>2021-06-24 17:18</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>人工智能范式的革命与通用智能理论的创生  网络首发</td>\n",
       "      <td>钟义信</td>\n",
       "      <td>智能系统学报</td>\n",
       "      <td>2021-06-22 15:09</td>\n",
       "      <td>NaN</td>\n",
       "      <td>146.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>基于工业物联网的智能矿山基础信息采集关键技术与平台</td>\n",
       "      <td>贺耀宜; 刘丽静; 赵立厂; 周李兵</td>\n",
       "      <td>工矿自动化</td>\n",
       "      <td>2021-06-21 15:36</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>基于北斗系统的物联网技术与应用</td>\n",
       "      <td>谢军; 庄建楼; 康成斌</td>\n",
       "      <td>南京航空航天大学学报</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>77.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>基于知识图谱的区块链物联网领域研究分析</td>\n",
       "      <td>李嘉明; 赵阔; 屈挺; 刘晓翔</td>\n",
       "      <td>计算机科学</td>\n",
       "      <td>2021-06-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1299</th>\n",
       "      <td>1350</td>\n",
       "      <td>人工智能与认知研究的新进展</td>\n",
       "      <td>胡懋仁</td>\n",
       "      <td>自然辩证法研究</td>\n",
       "      <td>1994-05-18</td>\n",
       "      <td>4.0</td>\n",
       "      <td>560.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1300</th>\n",
       "      <td>1351</td>\n",
       "      <td>从思维科学看人工智能的研究</td>\n",
       "      <td>刘泉宝; 刘永清</td>\n",
       "      <td>计算机科学</td>\n",
       "      <td>1994-05-15</td>\n",
       "      <td>6.0</td>\n",
       "      <td>163.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1301</th>\n",
       "      <td>1352</td>\n",
       "      <td>人工智能的哲学思考</td>\n",
       "      <td>陶承德</td>\n",
       "      <td>学习论坛</td>\n",
       "      <td>1994-02-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>410.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1302</th>\n",
       "      <td>1353</td>\n",
       "      <td>人工智能研究中心</td>\n",
       "      <td>张孟杰</td>\n",
       "      <td>河北农业大学学报</td>\n",
       "      <td>1993-10-01</td>\n",
       "      <td>NaN</td>\n",
       "      <td>41.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1303</th>\n",
       "      <td>1354</td>\n",
       "      <td>归纳逻辑与人工智能相结合的研究问题</td>\n",
       "      <td>王雨田</td>\n",
       "      <td>哲学研究</td>\n",
       "      <td>1992-04-30</td>\n",
       "      <td>8.0</td>\n",
       "      <td>289.0</td>\n",
       "      <td>下载</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1354 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Unnamed: 0                           篇名                  作者          刊名  \\\n",
       "0              1  基于物联网技术的高校化学实验室安全监管系统的设计与实现   高文红; 孙欢; 韩晓敏; 陈剑波     实验技术与管理   \n",
       "1              2    人工智能范式的革命与通用智能理论的创生  网络首发                 钟义信      智能系统学报   \n",
       "2              3    基于工业物联网的智能矿山基础信息采集关键技术与平台  贺耀宜; 刘丽静; 赵立厂; 周李兵       工矿自动化   \n",
       "3              4              基于北斗系统的物联网技术与应用        谢军; 庄建楼; 康成斌  南京航空航天大学学报   \n",
       "4              5          基于知识图谱的区块链物联网领域研究分析    李嘉明; 赵阔; 屈挺; 刘晓翔       计算机科学   \n",
       "...          ...                          ...                 ...         ...   \n",
       "1299        1350                人工智能与认知研究的新进展                 胡懋仁     自然辩证法研究   \n",
       "1300        1351                从思维科学看人工智能的研究            刘泉宝; 刘永清       计算机科学   \n",
       "1301        1352                    人工智能的哲学思考                 陶承德        学习论坛   \n",
       "1302        1353                     人工智能研究中心                 张孟杰    河北农业大学学报   \n",
       "1303        1354            归纳逻辑与人工智能相结合的研究问题                 王雨田        哲学研究   \n",
       "\n",
       "                  发表时间   被引     下载  操作  \n",
       "0     2021-06-24 17:18  NaN    NaN  下载  \n",
       "1     2021-06-22 15:09  NaN  146.0  下载  \n",
       "2     2021-06-21 15:36  NaN    NaN  下载  \n",
       "3           2021-06-15  NaN   77.0  下载  \n",
       "4           2021-06-15  NaN    NaN  下载  \n",
       "...                ...  ...    ...  ..  \n",
       "1299        1994-05-18  4.0  560.0  下载  \n",
       "1300        1994-05-15  6.0  163.0  下载  \n",
       "1301        1994-02-15  NaN  410.0  下载  \n",
       "1302        1993-10-01  NaN   41.0  下载  \n",
       "1303        1992-04-30  8.0  289.0  下载  \n",
       "\n",
       "[1354 rows x 8 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 将内容表格存在本地\n",
    "with pd.ExcelWriter('知网文章数据.xlsx',mode='w',engine=\"openpyxl\") as writer:  \n",
    "            df_总表.to_excel(writer,sheet_name=\"知网\")\n",
    "display(df_总表)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 下载文件以及原文（refworks/pdf/text）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 开始第一轮下载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n"
     ]
    }
   ],
   "source": [
    "# 导出refworks文件（.txt）和下载文章\n",
    "# 每次全选不能超过500篇，分3次进行\n",
    "\n",
    "pages = list(range(1,11))\n",
    "print(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 返回第一页\n",
    "driver.find_element_by_id('total').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 清除选中\n",
    "driver.find_element_by_xpath('//*[@id=\"gridTable\"]/div[1]/div[2]/div[1]/a').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 选中页面50篇 —> 翻页\n",
    "def process_choose(pages):\n",
    "    for p in pages:\n",
    "        print (p,end='\\t')\n",
    "        全选 = driver.find_element_by_id('selectCheckAll1')\n",
    "        全选.click()\n",
    "        time.sleep(8+1*random())\n",
    "        跳转 = driver.find_element_by_id('PageNext')\n",
    "        跳转.click()\n",
    "        time.sleep(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t"
     ]
    }
   ],
   "source": [
    "process_choose(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # 因出现错误（非验证码），补选最后两页文章\n",
    "# driver.find_element_by_id('selectCheckAll1').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "# driver.find_element_by_id('PageNext').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "# driver.find_element_by_id('selectCheckAll1').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出与分析 \n",
    "driver.find_element_by_xpath('//i[@class=\"icon-d\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出文献\n",
    "driver.find_element_by_xpath('//i[@class=\"icon-r\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击Refworks\n",
    "driver.find_element_by_xpath('//a[@exporttype=\"Refworks\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CDwindow-831891B401192DCD264D5199DFB72E58',\n",
       " 'CDwindow-D7A8619E1C268C02313F162342670BF7',\n",
       " 'CDwindow-2A8C93989165DA5B5C40F7EAFB1B028E']"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 所有窗口ID\n",
    "driver.window_handles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出 .txt文件\n",
    "driver.find_element_by_xpath('//i[@class=\"icon icon-export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 关闭已经操作过的窗口\n",
    "driver.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-87-0188c2a7ff70>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 批量下载\n",
    "driver.find_element_by_xpath('//li[@class=\"bulkdownload export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-89-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下载所选文献（500篇）\n",
    "driver.find_element_by_id('btn-download-all').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 关闭已经操作过的窗口\n",
    "driver.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 开始第二轮下载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-92-0188c2a7ff70>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 清除选择\n",
    "driver.find_element_by_xpath('//*[@id=\"gridTable\"]/div[1]/div[2]/div[1]/a').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_id('PageNext').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]\n"
     ]
    }
   ],
   "source": [
    "# 第二轮下载\n",
    "pages = list(range(11,21))\n",
    "print(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "11\t12\t13\t14\t15\t16\t17\t18\t19\t20\t"
     ]
    }
   ],
   "source": [
    "process_choose(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出与分析 \n",
    "driver.find_element_by_xpath('//i[@class=\"icon-d\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出文献\n",
    "driver.find_element_by_xpath('//i[@class=\"icon-r\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击Refworks\n",
    "driver.find_element_by_xpath('//a[@exporttype=\"Refworks\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CDwindow-831891B401192DCD264D5199DFB72E58',\n",
       " 'CDwindow-81AFA6135FB58FA96C8C3D502F2BD2DE',\n",
       " 'CDwindow-C40591B3D1D0C9848EE42FED8F2FF624']"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 所有窗口ID\n",
    "driver.window_handles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-108-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出 .txt文件\n",
    "driver.find_element_by_xpath('//i[@class=\"icon icon-export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-110-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 关闭已经操作过的窗口\n",
    "driver.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-112-0188c2a7ff70>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 批量下载\n",
    "driver.find_element_by_xpath('//li[@class=\"bulkdownload export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-114-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下载所选文献（500篇）\n",
    "driver.find_element_by_id('btn-download-all').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 关闭已经操作过的窗口\n",
    "driver.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 开始第三轮下载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-117-0188c2a7ff70>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 清除选择\n",
    "driver.find_element_by_xpath('//*[@id=\"gridTable\"]/div[1]/div[2]/div[1]/a').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_id('PageNext').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[21, 22, 23, 24, 25, 26, 27]\n"
     ]
    }
   ],
   "source": [
    "# 第三轮下载\n",
    "pages = list(range(21,28))\n",
    "print(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "21\t22\t23\t24\t25\t26\t27\t"
     ]
    }
   ],
   "source": [
    "process_choose(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 选上最后一页\n",
    "driver.find_element_by_id('selectCheckAll1').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出与分析 \n",
    "driver.find_element_by_xpath('//i[@class=\"icon-d\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出文献\n",
    "driver.find_element_by_xpath('//i[@class=\"icon-r\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 点击Refworks\n",
    "driver.find_element_by_xpath('//a[@exporttype=\"Refworks\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['CDwindow-831891B401192DCD264D5199DFB72E58',\n",
       " 'CDwindow-81AFA6135FB58FA96C8C3D502F2BD2DE',\n",
       " 'CDwindow-F4F667B8997320B475D959C4BAF23C05']"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 所有窗口ID\n",
    "driver.window_handles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-134-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导出 .txt文件\n",
    "driver.find_element_by_xpath('//i[@class=\"icon icon-export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 关闭已经操作过的窗口\n",
    "driver.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-142-0188c2a7ff70>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[1])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 批量下载\n",
    "driver.find_element_by_xpath('//li[@class=\"bulkdownload export\"]').click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-144-520070efe65b>:2: DeprecationWarning: use driver.switch_to.window instead\n",
      "  driver.switch_to_window(driver.window_handles[2])\n"
     ]
    }
   ],
   "source": [
    "# 窗口切换\n",
    "driver.switch_to_window(driver.window_handles[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下载所选文献（最后篇幅）\n",
    "driver.find_element_by_id('btn-download-all').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 解决验证码问题：下载pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 解决必要时候的下载文章的验证码问题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'x': 993, 'y': 725}\n",
      "{'height': 34, 'width': 74}\n"
     ]
    }
   ],
   "source": [
    "# 截图验证码并且保存到本地准备用api解决验证码问题\n",
    "# coding:utf-8\n",
    "from selenium import webdriver\n",
    "from PIL import Image\n",
    "##截图\n",
    "driver.save_screenshot (r\"C:\\Users\\46547\\Desktop\\web_ming_final\\image\\下载.png\")\n",
    "#2.定位到验证码图片元素\n",
    "code_ele =driver. find_element_by_xpath ('//*[@id=\"changeVercode\"]')\n",
    "#3.元素的位置，结果：P'y';,x:},为图片左上角的位置\n",
    "print(code_ele.location)\n",
    "#4.元素的大小，结果：｛＇height';,'width':}\n",
    "print(code_ele.size)\n",
    "#5.得到将元素的具体位置\n",
    "x0 = code_ele.location[\"x\"]+250\n",
    "y0=code_ele.location[\"y\"]\n",
    "x1 = code_ele.size[\"width\"] + x0+2000\n",
    "y1=code_ele.size[\"height\"] +y0 +3000\n",
    "img = Image.open(open (r\"C:\\Users\\46547\\Desktop\\web_ming_final\\image\\下载.png\",'rb'))\n",
    "image = img.crop((x0,y0,x1,y1))#左、上、右、下\n",
    "image.save(r\"C:\\Users\\46547\\Desktop\\web_ming_final\\image\\验证码＿下载.png\")#将验证码图片保存为code_img.png"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下载文章验证码进行api识别\n",
    "import requests\n",
    "import time\n",
    "import hashlib\n",
    "import base64\n",
    "import json\n",
    "def parse_img(APPID,API_KEY):\n",
    "    curTime = str(int(time.time()))\n",
    "#  支持语言类型和是否开启位置定位(默认否)\n",
    "    param = {\"language\": \"cn|en\", \"location\": \"false\"}\n",
    "    param = json.dumps(param)\n",
    "    paramBase64 = base64.b64encode(param.encode('utf-8'))\n",
    "\n",
    "    m2 = hashlib.md5()\n",
    "    str1 = API_KEY + curTime + str(paramBase64,'utf-8')\n",
    "    m2.update(str1.encode('utf-8'))\n",
    "    checkSum = m2.hexdigest()\n",
    "\n",
    "    # 组装http请求头\n",
    "    header = {\n",
    "        'X-CurTime': curTime,\n",
    "        'X-Param': paramBase64,\n",
    "        'X-Appid': APPID,\n",
    "        'X-CheckSum': checkSum,\n",
    "        'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8',}\n",
    "    URL = \"http://webapi.xfyun.cn/v1/service/v1/ocr/general\"\n",
    "    with open(r'C:\\Users\\46547\\Desktop\\web_ming_final\\image\\验证码＿下载.png','rb') as f:\n",
    "            img = f.read()\n",
    "            img = str(base64.b64encode(img), 'utf-8')\n",
    "    data = {'image': img}\n",
    "    r = requests.post(URL, data=data, headers=header)\n",
    "    ret = str(r.content, 'utf-8')\n",
    "    result = json.loads(ret)['data']['block'][0]['line'][0]['word'][0]['content']\n",
    "    return result\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [],
   "source": [
    "parse_img(\"427b1be6\",\"ba65f1334305e5cb63a4da6d3f2071f8\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "线框验证码=driver.find_element_by_xpath('//*[@id=\"vcode\"]')\n",
    "线框验证码.send_keys(result)\n",
    "driver.find_element_by_xpath('/html/body/div/form/dl/dd/button').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 将验证码识别过程封装成函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***需将APPID、API_KEY以及屏幕设配做出个人相应调整***  \n",
    "***创建img文件夹、调整浏览器下载的路径，在下面代码做出相应的调整，否则无法使用***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "def verification_code():\n",
    "        ###循环10次一般可以解决验证码，不够可再加\n",
    "        for i in range(0,10):\n",
    "            ##截图（创建文件夹用于保存图片）\n",
    "            driver.save_screenshot (r\"C:\\Users\\46547\\Desktop\\web_ming_final\\下载.png\")\n",
    "            #2.定位到验证码图片元素(调整位置)\n",
    "            code_ele =driver.find_element_by_xpath ('//*[@id=\"vImg\"]')\n",
    "            x0 = code_ele.location[\"x\"] +120\n",
    "            y0=code_ele.location[\"y\"] +70\n",
    "            x1 = code_ele.size[\"width\"] + x0 + 120\n",
    "            y1=code_ele.size[\"height\"] +y0\n",
    "            img = Image.open(open (r\"C:\\Users\\46547\\Desktop\\web_ming_final\\下载.png\",'rb'))\n",
    "            image = img.crop((x0,y0,x1,y1))#左、上、右、下\n",
    "            image.save(r\"C:\\Users\\46547\\Desktop\\web_ming_final\\验证码＿下载.png\")#将验证码图片保存为code_img.png为code_img.png\n",
    "        #调用api\n",
    "            sleep(1)\n",
    "            ####APPID=\"427b1be6\"\n",
    "            ####API_KEY=\"ba65f1334305e5cb63a4da6d3f2071f8\"\n",
    "            curTime = str(int(time.time()))\n",
    "        #  支持语言类型和是否开启位置定位(默认否)\n",
    "            param = {\"language\": \"cn|en\", \"location\": \"false\"}\n",
    "            param = json.dumps(param)\n",
    "            paramBase64 = base64.b64encode(param.encode('utf-8'))\n",
    "\n",
    "            m2 = hashlib.md5()\n",
    "            str1 = API_KEY + curTime + str(paramBase64,'utf-8')\n",
    "            m2.update(str1.encode('utf-8'))\n",
    "            checkSum = m2.hexdigest()\n",
    "\n",
    "        # 组装http请求头\n",
    "            header = {\n",
    "                'X-CurTime': curTime,\n",
    "                'X-Param': paramBase64,\n",
    "                'X-Appid': APPID,\n",
    "                'X-CheckSum': checkSum,\n",
    "                'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8',}\n",
    "            URL = \"http://webapi.xfyun.cn/v1/service/v1/ocr/general\"\n",
    "            with open(r'C:\\Users\\46547\\Desktop\\web_ming_final\\验证码＿下载.png','rb') as f:\n",
    "                img = f.read()\n",
    "                img = str(base64.b64encode(img), 'utf-8')\n",
    "                data = {'image': img}\n",
    "            r = requests.post(URL, data=data, headers=header)\n",
    "            ret = str(r.content, 'utf-8')\n",
    "            ### 判断api输入的值是否符合条件（可能只有3位数，导致错误）\n",
    "            try:\n",
    "                result=json.loads(ret)['data']['block'][0]['line'][0]['word'][0]['content']\n",
    "                线框验证码=driver.find_element_by_xpath('//*[@id=\"vcode\"]')\n",
    "                线框验证码.clear()\n",
    "                线框验证码.send_keys(result)\n",
    "                driver.find_element_by_xpath('/html/body/div/form/dl/dd/button').click()\n",
    "                sleep(2)\n",
    "                ###失败则更换图片继续运行\n",
    "                try:\n",
    "                    driver.find_element_by_xpath('//*[@id=\"vImg\"]').click()\n",
    "                    driver.find_element_by_xpath('/html/body/div/form/dl/dd/button').click()\n",
    "                    driver.find_element_by_xpath ('//*[@id=\"vImg\"]') \n",
    "                ###成功或出现并发、错误则关闭下载页和详情页返回总链接页结束\n",
    "                except:\n",
    "                    driver.close()\n",
    "                    sleep(2)\n",
    "                    driver.switch_to_window(driver.window_handles[2])\n",
    "                    sleep(2)\n",
    "                    driver.close()\n",
    "                    driver.switch_to_window(driver.window_handles[1])\n",
    "                    break###成功则下载，不成功则pass（下一个）\n",
    "             ### 不符合则点击换验证码      \n",
    "            except:\n",
    "                driver.find_element_by_xpath('//*[@id=\"vImg\"]').click()\n",
    "            return"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 验证码问题最终封装函数代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.switch_to_window(driver.window_handles[1])\n",
    "element = driver.find_element_by_xpath('//span[@class=\"total\"]')\n",
    "max_page=int(element.get_attribute(\"textContent\").replace(\"共\",\"\").replace(\"页\",\"\"))\n",
    "for a in range(1,max_page):\n",
    "    ###点击详情页下载\n",
    "    all_当前页面的所有文章=driver.find_elements_by_xpath('//td[@class=\"name\"]//a')\n",
    "    sleep(3)\n",
    "    for b in range(len(all_当前页面的所有文章)):\n",
    "        driver.find_elements_by_xpath('//td[@class=\"name\"]//a')[b].click()\n",
    "        driver.switch_to_window(driver.window_handles[2])\n",
    "        path = \"D:\\知网_pdf\"      # 输入文件夹地址（更改为自己浏览器下载文件夹）\n",
    "        files = os.listdir(path)   # 读入文件夹\n",
    "        num = len(files) \n",
    "        element=driver.find_element_by_xpath('//*[@id=\"pdfDown\"]')\n",
    "        driver.execute_script(\"arguments[0].click();\", element)\n",
    "        sleep(6)\n",
    "        nums= len(os.listdir(path))\n",
    "        ### 查看是否有新增下载文件出现    \n",
    "        if nums>num:\n",
    "            driver.close()\n",
    "            sleep(1)\n",
    "            driver.switch_to_window(driver.window_handles[1])\n",
    "        ### 无则进行是否有验证码页面的判定\n",
    "        else:\n",
    "            driver.switch_to_window(driver.window_handles[-1])\n",
    "            driver.switch_to_window(driver.window_handles[-2])\n",
    "            driver.switch_to_window(driver.window_handles[-1])\n",
    "            try:\n",
    "                driver.find_element_by_xpath ('//*[@id=\"vImg\"]')\n",
    "                verification_code()\n",
    "            ### 没出现验证码则关闭当前窗口退回\n",
    "            except:\n",
    "                driver.close()\n",
    "                driver.switch_to_window(driver.window_handles[1])###回到链接页面\n",
    "    element=driver.find_element_by_xpath('//*[@id=\"PageNext\"]')\n",
    "    element.click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 期末总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 本次期末代码中间遇到的问题：\n",
    "1. 每次文章批量下载只能下载500篇；\n",
    "2. 随着下载次数和频率的增加，大概率会出现验证码问题；\n",
    "3. 后期窗口切换较多，需要适当关闭部分已经操作过的窗口；"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 解决办法：\n",
    "1. 将1,354篇文章分开3次进行下载\n",
    "2. 加入api，解决验证码问题\n",
    "3. 后期窗口较多，适时关闭某些窗口"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
